The healthcare industry generates a vast amount of data, including information about patient care, hospital operations, and financial performance. The purpose of this project is to provide a comprehensive understanding of the quality of care provided by different hospitals in the US and their operational efficiency. We will be using various data sources to examine factors such as overall ratings, maximum capacity, readmission rates, mortality rates, staffing levels, and more. The insights generated from this analysis will be visualized using state-of-the-art data visualization techniques, making it easy to understand and comprehend. The visual representations will provide a clear picture of the current state of the US healthcare system and will help identify areas for improvement.
Whether you are a healthcare provider, policymaker, or patient, this project will provide valuable information that can be used to make informed decisions about the quality of care available in different regions of the country. The report is divided into 3 parts that weave a story and provide insights into the healthcare industry in the US, especially the Hospitals:
1. Location of Hospitals: How are the Hospitals distributed across the country
2. Resources in Hospitals: Whom do the hospitals belong to and what facilities do they provide
3. Quality of Hospitals: How are the hospitals rated and what factors affect their rating
The data set provided to us contains a list of over 7.5k hospitals in the USA with their latitude, longitude, city, county, state, type, ownership, capacity, etc. Since the data set lacks numerical content barring the capacity of hospitals (number of beds), we chose to refer to the following external data sets:
1. hospitals_general_info: In addition to the names and the states of the hospital which we can use to merge this data set to the original one, this data set also contains six attributes and the overall ratings of the hospitals, which helps us determine the quality of hospitals.
2. population_data: For all the 50 states in USA, this data set gives us the total population and the population density for them. This helps us compare the distribution and count of hospitals with respect to the population density of the state.
3. cms_hospital_patient_satisfaction_2016: This data set provides us with the customer reviews of the hospitals in US. Using that we can correlate the reviews to the overall ratings of the hospitals.
A choropleth map is a thematic map that is used to represent statistical data using the color mapping technique. It displays enumeration units, or divided geographical areas or regions that are colored, shaded or patterned in relation to a data variable. From this choropleth map it is evident that Texas has the highest number of hospitals. The state of California has the second most number of hospitals in the USA.
Leaflet is an open-source JavaScript library that is used to create dynamic online maps. The identically named R package makes it possible to create these kinds of maps in R as well. Here we use the leaflet map to pin down the specific location of hospitals with an emergency ward in the state with the most number of hospitals, i.e., Texas. The University Hospital in San Antonio has the most number of beds (most capacity of 1034). On a high level it is observed that the most number of required hospitals are in the city of Dallas followed by Houston.
The grouped bar chart gives us a comparison of the count of hospitals which are grouped by states and then grouped by their ownership. It is observed that the four states with the most number of hospitals are Texas, California, Florida and Ohio in that order. 436 of the hospitals in Texas and 155 in Florida are proprietary and own the most hospitals in those states. On the other hand, 284 of the hospitals in California and 171 of them in Ohio are run by non-profit organizations. We can also infer that the Government owned hospitals are the least in number in the four states with the most hospitals.
The pie chart or the circle chart, is a way of summarizing a set of nominal data or displaying the different values of a given variable (e.g. percentage distribution in this case). Here we can observe how all the hospitals in US are classified according to the 10 types mentioned in the data set. It is obvious from the chart that 61.7% of the hospitals in USA belong to the General Acute Care type. The second most common type of hospitals are Critical Access with 14.3% of the total count. The Chronic Disease type hospitals are least in number in USA with only a share of 0.217% of the total population.
A stacked area chart is the extension of a basic area chart. It displays the evolution of the value of several groups on the same graphic. The stacked area chart here shows us the count of beds that the hospitals have, which are grouped by their ownership. It is evident that the hospitals that are owned by non-profit organizations and state governments have the most number of beds in the country.
## Warning: Ignoring 5 observations
## Warning: Ignoring 5 observations
## Warning: Ignoring 5 observations
A scatter plot shows the relationship between two quantitative variables measured for the same individuals. From the plot it is evident that the state of Texas has the highest hospital to population density ratio - meaning it is the state with the highest number of hospitals per population density in the area. On the other hand, the state of New Jersey has the lowest hospital count to population density ratio with number of hospitals equal to 135 and a population density of 1283.4, which is the worst case.
A bubble chart is primarily used to depict and show relationships between numeric variables. In this case we can observe 3 data points - total capacity in the states (total beds in the hospitals in the state), population density and the states. In line with the inference from 2.3, again we can see that Texas has the most number of beds for the population density of Texas.
## Warning: Using the `size` aesthetic in this geom was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` in the `default_aes` field and elsewhere instead.
Alluvial chart shows us the patterns and trends in our data set. In the above alluvial chart, on a high level it is observed that all the six attributes contribute equally to the overall rating of the hospitals. With a more keen look we can see that readmission contributes the most to the overall rating of the hospital with most of the five star rated hospitals having above national average readmission rates.
A radar chart shows multivariate data of three or more quantitative variables mapped onto an axis. Here we can see the responses of the patients in a survey and correlate them to the overall rating of the hospital. With the 5 lines in the spider chart showing the average of all the hospitals grouped by their ratings, we can observe that cleanliness is the most important factor in a hospital getting a 5 star rating. All of them have nearly a perfect cleanliness rating. On the other hand, pain management is the least contributor to the 5 star rated hospitals and care transition is the most important factor in hospitals getting 1 star ratings.
A stacked chart is a form of bar chart that shows the composition and comparison of a few variables, either relative or absolute, over time. Here we can observe how the values of attributes are composed - above, below or same as national average. Most of the hospitals in USA have above average effectiveness while most of them have the worst patient experience and timeliness. On a high level we can conclude that mortality and efficiency medical imaging are more or less same as average for US hospitals.
It is evident from both the leaflet charts that most of the five star rated hospitals are scattered throughout the central-east region of America with most of the concentration being in Illinois and the least being on the west coast, with only one hospital having 5-star rating (located in Santa Barbara). On the other hand, the one star rated hospitals are also evenly distributed with most of them located in New York and California.
The distribution of hospitals across the states in the United States is not even, and there are a number of factors that influence the number of hospitals in a particular state. Some of the factors include population size and density, socioeconomic status, rural versus urban location, and access to healthcare resources. We found that in Texas, the states with the largest number of hospitals are Texas, California, and Florida. Other states with a less numbers of hospitals include Wyoming, and Alaska. The ownership structure of hospitals in the United States is diverse and can include various types of entities such as non-profit organizations, government entities, for-profit corporations, and partnerships. There are a significant number of privately owned hospitals in Texas, and Florida. Next, the largest types of hospitals in the US are non-Profit. These hospitals focus on serving their communities and providing medical care to patients in need, regardless of their ability to pay. The State of California, Florida, and Ohio have hospitals mostly owned by NON-Profit Organization. Texas has a large number of hospitals, particularly in its major cities such as Houston and Dallas.These cities are the home to some of the largest medical centers in the world. These medical centers offer a wide range of services, from routine care to specialized treatments for complex medical conditions.
In the second section, we explored whom the hospitals belong to and what are the resources they provide. Firstly, we looked at the different types of hospitals and their distribution across the country: 61.7% of the hospitals in USA belong to the General Acute Care type, while the Chronic Disease type hospitals are least in number in USA with only a share of 0.217% of the total population. The stacked area graph then shows the distribution of hospitals by ownership and their capacity. We infer that the hospitals that are owned by non-profit organizations and state governments have the most number of beds in the country. Thirdly, we have a scatter plot to compare the population density of each state with the number of beds. From the plot it is evident that the state of Texas has the highest hospital to population density ratio while New Jersey has the lowest hospital count to population density ratio. This shows us that New Jersey needs to invest more in building hospitals to improve this ratio. Finally, the bubble chart shows us something similar- the number of beds vs the population density statewise. Again we can see that Texas has the most number of beds for the population density of Texas.
The third section focuses on the quality of the hospitals. We take into consideration the overall rating of the hospitals & the attributes/factors that are obtained from the hospitals_general_info data set and the customer reviews from the cms data set. The alluvial chart shows us that almost all attributes contribute equally to the overall rating, with the Readmission contributing the most. From the radar chart we can conclude that cleanliness is the most important factor that patients look for and that helps the hospital the most to get a five star rating. Among the attributes, effectiveness for US hospitals is mostly above average while US hospitals have the worst patient experience and timeliness. Lastly, it is observed that most of the 5 star rated hospitals are located in and around the state of Illinois while there is only one 5 star hospital located on the east coast. New York and California have the most 1 star rated hospitals in the US.
Overall, out analysis gives a fair idea about the geographical distribution, the resource availability, and the factors that determine the quality of a hospital.